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Abstract 

/ 

The expectation-maximization (EM) algorithm is popular in estimating parameters of various 

/ 

statistical models. In this paper, we consider applications of the EM algorithm to the maximum 

a posteriori (MAP) sequence decoding assuming that sources and channels are described by 

\ 

hidden Markov models (HMMs). HMMs can accurately approximate a large variety of 
communication channels with memory and, in particular, wireless fading channels with noise. 
The direct maximization of the a posteriori probability (APP) is too complex. The EM algorithm 
allows us to obtain the MAP sequence estimation iteratively. Since each step of the EM 
algorithm increases the APP, the algorithm can improve performance of any decoding procedure. 
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1. INTRODUCTION 

Maximum a posteriori (MAP) sequence decoding is optimum, because it minimizes the 
probability of error. ^ 27 ^ It is usually performed by the Viterbi algorithm and most recently by 
the turbo decoding algorithm for a special class of codes. These algorithms are directly 
applicable if communications channels are memoryless. However, they perform an approximate 
MAP decoding if channel errors are bursty which is the case in wireless communications due to 
fading. It is usually very difficult to find a sequence MAP estimate directly for channels with 
memory. The expectation-maximization (EM) algorithm has been successfully applied by 
Georghiades and Han, ^ Zeger and Kobayashi, ^ and Georghiades ^ to decoding information 
transmitted over the fading wireless channel. In these papers, the fading channel is modeled by a 
complex Gaussian process. 

Alternatively, the wireless channel can be modeled by hidden Markov models (HMMs). 
[22,23] j t can ^ s hown that HMMs are general enough to approximate not only fading, but also 
other types of signal distortion such as interference and non-Gaussian noise. ^ 23 ' In this paper, 
we demonstrate that the EM algorithm can be applied to MAP decoding if channel errors are 
described by an HMM. In our approach, the signal parameters are obtained by applying the EM 
algorithm to maximizing the a posteriori probability (APP) of the transmitted symbols. 

In developing the MAP decoding algorithm it is usually assumed that the communication 
channel is memoryless which is achieved by interleaving. However, in the majority of real 
channels the error dependence extends over long time intervals. Complete independence is 
impossible to achieve due to the information delivery delay constraints and system memory 
limitations. We should also remember that a memoryless channel has lower capacity than a 
channel with memory with the same bit-error rate. '- 7,10 -' Therefore, it is important to develop 
decoding algorithms for channels with memory. 

There are many different models of channels with memory which correspond to various 
channel impairments such as fading, interference, and noise. All of them can be accurately 
approximated by hidden Markov models (HMMs). It can be shown that HMMs represent a dense 
family allowing us to approximate a large variety of stochastic processes. Their application in 
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many diverse fields (such as speech, image, and handwriting recognition, experimental genetics, 
sociology, stock market modeling, etc) serves as experimental evidence of their generality. 
Another reason for their popularity is the relative simplicity of their use. 

In this paper we develop the MAP decoding algorithm for a general input-output HMM 
(IOHMM) which, as we will see, incorporates source and channel HMMs. For the special class 
of models, the decoding can be performed using the Viterbi algorithm. However, in the general 
case this algorithm gives only an approximate solution. We demonstrate that this approximate 
solution can be improved by the expectation maximization (EM) algorithm. The paper is 
organized as follows. In section 2, we discuss the relationship between various decoding criteria. 
In section 3, we consider the IOHMMs and their application to computing the transmitted 
symbol sequence APP. In section 4, we develop the MAP EM decoding algorithm and illustrate 
its application to decoding of block and convolutional codes. 

2. MAP DECODING 

Suppose that we have an input-output system whose input is described by a sequence of 
symbols x\ = (X x y X 2 , . . . ,X T ) and the corresponding output is Y\ = (Y x ,Y 2 , • • . , Y T ). Our 
goal is to choose the most probable input which produced the observed output Y\ . 

The optimal estimator that maximizes the probability of X\ correct decoding is the maximum 

T271 

a posteriori (MAP) estimator: L 1 

if = argmax Pr(X\ \ Y\) (1) 

where Pr() denotes the corresponding probability or probability density function. Since the 
maximization does not depend on Y\ , the MAP estimate can be obtained by maximizing the 
unnormalized APP 

if = argmax Pr{X\j\) = argmax Pr(Y T { \ X\)Pr{X T x ). (2) 

It follows from this equation that the maximum likelihood (ML) and MAP estimates coincide if 
the input is uniformly distributed. This assumption is often made when there is no information 
about the input probability distribution. ^ However, better results can be achieved if we 
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exploit the input (source) statistics. Since the ML estimate can be viewed as a special case of 
the MAP estimate, we consider only the latter in the sequel. 

It is often necessary to estimate a subset of the input sequence and in particular a single 
symbol: 

X t = argmzxPr(X tJ Y{). (3) 

In many applications the estimation must be performed before receiving the whole sequence 
yf. If estimation of X t is made on the basis of Y t x ¥X with x>0, it is called a fixed lag smoothing; 
if t=0, it is called a filtering; if t<0, it is called a prediction. To solve the previous equations we 
need to develop algorithms for calculating the corresponding probability measures and their 
maximization. This is done by modeling the input-output system. 

3. HIDDEN MARKOV MODELS 

An input-output HMM X = (S,A',} r ,it,{P(x, < y) }) is defined by its internal states 
S = {1,2,. inputs X, outputs Y, initial state probability vector tc, and the input-output 
probability density matrices (PDM) P(x,j>)» xeX, ye Y whose elements Pij(x,y) = Pr(j,x,y | i) 
are the conditional probability density functions (PDF) of input x and corresponding output y 
after transferring from state i to state j . It is assumed that the state sequence Sq = (S 0 ,S X , 
input sequence X t x = {X x ,X 2i . . . ,X t )> and output sequence T x - (Y {l Y 2 , • . . ,Y t ) possess the 
following Markovian property 

Pr(S t ,X n Y t | S^X' 1 ^' 1 ) = Pr(S t ,X n Y t \ S,- X ). 

According to this model, the PDF of the input sequence X\ and output sequence Y\ has the form 

[23] 

Pt {x t x j\) = % iinxiji) i (4) 

1 = 1 

where 1 is a column vector of h ones. 

If the source sequence is modeled by an autonomous HMM with states S\ s) and PDM 
PjOO = [Pr{&i s) *x\Sff-\)\n„n, and, for each input sequence, the channel is modeled by the 
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conditional HMM with states and PDM P c (>|jc) = [Pr(S { t c) j^S^ , then the 



combined IOHMM is described by the PDM of the form 

?(X,Y) = ? 5 {X)®*c{Y\X) 



(5) 



where <8> denotes the matrix Kronecker product. In other words, the combined model is an HMM 
whose states represent all possible combinations of the transmitted sequence states and channel 
states. 



In the most popular model, the conditional output PDM has the form 

? C {Y\X) = ? C * C {Y\X) 



(6) 



where P c = .[/?//] „ ef ii e is the channel state transition probability matrix and B C (Y\X) is a diagonal 
matrix of the state output probabilities. For example, according to the Gilbert-Elliott ^ ,l0 ^ 
model, the channel has two states: "good" and "bad". In the good state, errors occur with small 
probability b \ while in the bad state they occur with larger probability b 2 . If we assume that the 
first state is good and the second state is bad, then the model is described by equation (6) with 



B C (X\X) = 



1-6, 0 
0 l-b 2 



B C (X\X) = 



b x 0 
0 b 2 



(7) 



where X denotes the complement of X. The two-state model is the simplest HMM for channels 

F 1 9 231 

with memory. Models with larger state space are often needed. 1 ' 1 There are several HMMs 
for fading channels with additive white Gaussian noise (AWGN). In these models, the HMM 
states are usually associated with the channel fade, levels while the state output conditional 

f 13 18 22 231 

probability density functions are Gaussian *■>''* 



MlM*,)~exp(- || Y,-a k X, \\ 2 /N 0 ). 



(8) 



Let us consider now the transmitted sequence modeling by HMMs. Usually, the source 
HMM is obtained by fitting it to experimental data. This process is called the HMM training. 

If the transmitted sequence is generated by a trellis coded modulator (TCM), 1>24 ^ then it 
can be described as a finite state machine: 
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X t = g,( <><*>,/,). (9) 

After receiving an information symbol /, in state S^ t s) the machine transfers to state 
$ + 1 = ft(^t s) Jt) anc * outputs a symbols, = g t (S\ s) y I t ). The system is called TCM, because its 
state transition graph resembles a trellis whose nodes represent the states for t- l,2,...,rand 
edges represent all possible state transitions. In a typical implementation, the first equation in 
(9) represents a convolutional encoder while the second equation represents a modulator. 

It follows from this description that the modulator output symbols are not independent, even 
if the source symbols are independent. From a statistical point of view, TCM can be described 
as a method of creating correlated sequences that are resistant to channel errors, allowing us to 
recover the source sequences with high reliability. It follows from the TCM definition that if the 
source symbols can be modeled by an HMM, the output process is also an HMM. 

4. MAP SEQUENCE DECODING VIA THE EM ALGORITHM 

If the channel is modeled by an IOHMM, the MAP estimate of the sequence X\ maximizes 
the right hand side of equation (4). However, as we can see, the direct maximization is a 
difficult problem. If X t is a discrete variable, then we need to consider all possible sequences X\ . 
Thus, the complexity grows exponentially with T. In the special case, when the input sequence 
x\ is uniquely determined by the sequence of states Sf, the maximum can be found by the 
Viterbi algorithm. ^ In this paper, we show that the general problem can be solved iteratively 
using the EM algorithm which, in our case, is a combination of the forward-backward and 
Viterbi algorithms. ^ 2 ' 16 ^ The EM algorithm converges monotonically, ^ therefore, even single 
step of the EM algorithm allows us to improve performance of any decoding algorithm. 

To develop the algorithm, we define the complete data probability distribution as 

[23, Sec. 3.2.2] 

y(z,*[>[) = n i9 flP^i.i^Jt) (10) 

where z = if is the HMM state sequence and Pij(X,Y) are the elements of the matrix ?(X y Y). 
The MAP sequence estimate of equation (2) can be obtained iteratively by the following EM 
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algorithm [2] [23] 



*f. P + i = argmaxQ(X\,Xj, p ), p=0,l,2,... (11) 



where Q{X[ ,X\ p ) is the auxiliary function which, in our case, can be written as 

Q(X{,Xl p ) = ^(z,Xl p ,Y\)log^(z,XlY{). 

Using the relationship between equations (10) and (4) we can rewrite the auxiliary function in 
the following form ^ 23 ^ 

Q(x\,xl P ) = £ iiyijjix&iogpijiXi.r,) + c (12) 

< = i < = iy= i 

where C does not depend on x\ 

a i (X t \ p ,Y t i) and §i{X]+\ py Yj+\) are the elements of the following forward and backward 
probability vectors 

a{X t x Xx) = nlinXiJi) and $(X\ ,7*, ) = n P(*„r,)l. (13) 

It follows from these equations that the forward probability vectors can be evaluated by the 
forward algorithm 

a(A?,r?) = rc, 0(^1 = a(X t l -\rr l )F(X ti Y t ) (14) 
and the backward probability vectors can be evaluated by the backward algorithm 

p(jrf + i,rf + i) = 1, ftxl.rf) = ?(x t j t )$(xf+ u Yf +x ). (15) 

As we can see, the auxiliary function Q{X\ ,X\ p ) is much simpler than p T (X\,Y\). In the 
majority of practical cases it is not difficult to find a maximum of the auxiliary function. For 
example, if source coded speech signal, which is modeled by an HMM with the mixture of 
Gaussian state-conditional PDFs, is transmitted over the slowly fading channel with additive 
Gaussian noise, it is possible to find the closed form solution of equation (11). ^ However, the 
matrix F(X ti Y t ) size might be large. The previous equations dimensionality can be reduced if 
this matrix has a special form. 
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5. MAP DECODING OF TCM SIGNALS ON CHANNELS WITH MEMORY 

For a TCM system with an i.i.d information sequence, equation (5) takes the form 



nX„Y t ) = [pw,Vc(Y t | X,)] (16) 



where 



lPr(I t ) if Sft, = /,(#>,/,) 



A>Sj'>$S, " < ( 17 ) 

0 otherwise 
which means that the matrix ?(X n Y t ) is sparse. In this case, we can write 

Pr (/f,yf) = * e ftP&M*c{Yi\Xi)i. 

f = l 

Here and in the following equations, % c is a vector of the initial probabilities of the channel 
states, X { = g t (S^ s) ,/,), and the product is taken along the state trajectory = f t (S^ s) ,/,) for 
f = l f 2,...,r. Thus, for the TCM system, instead of a large sparse matrix ?(X t ,Y t ), we can use 
smaller matrices P c (y, | X t ) for computing the APP. 

If we assume that all the information symbols are equiprobable, then the MAP estimate is 
equivalent to the ML estimate: 

IT 



/, = argmzxK c Y\? c (Y t \X t )l. 
; > t = i 



The auxiliary function can be also written in terms of the smaller matrices: 

Qdlilp) = I n i n iytjj(ilp)iogp CJ j(YAx t ) + c (18) 

/=1 i = \j=\ 

where X t = g t (S\ s) ,/,) and 

lujVlp) = a/(^rM/ / i7p l )Pc^(^l^)Py(^i U^i,p) (19) 

o. i {Y 1 x~ x and p y - ( y/V i |/f+i f p) are the elements of the forward and backward probability 
vectors 

a(Y\ \P\ ) = JC C n Fed', 1^/) and P(y, |/, ) = n *c{Yi mi (20) 
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which can be computed by the forward and backward algorithms, respectively. 

It follows from equation ( 1 8) that we can apply the Viterbi algorithm with the branch metric 

m{I { ) = ElY/,;(/rp)logp c ,,y(^l^), ' = 1,2,.,., F (21) 

to find a maximum of Q(l\ ,/f. p ) which can be interpreted as a longest path leading from the zero 
state to one of the states S^K (Note that we are considering only the encoder trellis.) It is 
convenient to use the Viterbi algorithm in the backward direction to combine it with the 
forward-backward algorithm. In the forward direction, we compute and save in memory 
a \ I\ yP ) for t = 1 ,2,...,r; in the backward direction, we compute recursively $(Yf +[ \lf+\ tP ) for 
f = r- 1, r-2,.,.,1, then we compute y t jj(lI tP ) and, for each encoder state at time t, determine the 
longest path starting from this state with branch metric m(I t ) and the corresponding input 
sequence generating this path. Thus, the described EM algorithm can be summarized as follows: 

1. Select initial sequence /[ 0 = • • • > h,o 

2. Forward part: 

Seta(y?|/?) = *andforr=l,2,...,rcompute*, tP = g,($ j) J tlP ), 

a(r { \i* Up ) = a(rr x \i\: P l )p c (Y t \Xt iP ) (22) 

3 . Backward part: 

Set P(l r f +1 |/? + i, p ) = 1 and the survivors [4] lengths L(S^) = 0 for 5^ = 1, 2,...,/i,. 
For r = r,r-l 1 computed, = g,(S^ } ,/,), 

= max{I[/ f (#>,/,)] + m(I t )} 
/,(#>) = argmax {£[/,(.#>,/,)] + m{I,)) 

MYfVlp) = Pc(Y,\X ltP mYf +l \ll Up ) (23) 

4. Reestimate the information sequence: 

AAA A 

A 

where Si = 0 
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5. If I t p + ! *I tyP , go to step 2; otherwise decode the information sequence as l\ p + \ . 
Note that if the terminal state is forced to be zero, then we set Z*(5^ } ) = for *0. 

It is clear from the algorithm's description that the number of operations required by the 
algorithm is k = N l -N FB -N v where N FB is the number of operations required by the forward- 
backward algorithm, ^ N v is the number of operations required by the Viterbi algorithm, ^ and 
N f is the number of iterations. The number of iterations is a random number which depends on 
the initial approximation of the decoded sequence. Because of the discrete nature of the decoded 
sequence, the number of iterations is usually small. In our simulation study, for the code 
described in Sec. 5.2 and channel of Ref. [10], we compared the EM algorithm with the 
exhaustive search (for short sequences). It took less than three iterations for the algorithm to 
converge for various initial guesses. 

The forward-backward algorithm requires saving in memory all the forward probability 
vectors. The backward part of the algorithm starts only after the whole sequence Y\ has been 
received. This can cause a significant problem if the sequence is long. The problem can be 
solved by using an approximate forward-only fixed-lag algorithm. L J By combining the fixed- 
lag algorithm with the above EM algorithm it is possible to perform the decoding in the 
forward-only fashion. The combined algorithm lends itself to parallelization and pipelining. 

This algorithm can be applied to a general IOHMM. For the model special cases the branch 
metric can be simplified. In particular, if the model parameters are given by equation (6), we can 
write M 

m{i t ) = n £y t Ailp)t>j( Y t\ x t)> f=l,2,...,r. (24) 

1 = 1 

where 

Ym(/u,) = M*! K„)Ml7 + i I'f+i,,) 

For the fading channel with AWGN the algorithm can be simplified. Substituting equation (8) 
into equation (18) we conclude that the maximization of Q(/f,/f iP ) is equivalent to 
minimization of the following quadratic form 
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*(/[./[,) = £ hudtp) II Y,-a t X, || 2 (25) 

which can be accomplished by the Viterbi algorithm with the branch metric 

m(I t ) = LYm('[,p) II Yi-atX, \\\ ;=l,2,...,7\ (26) 

For a PSK modulated sequence, we can write 

R(I\JIp) = -2 2 lY M (/[, P )Re{^y;a l } + C (27) 

r =i (=i 

where C is independent of /[ . Thus, we can use a simpler metric 

= lYM(/r,p)Re{^r; ai } (28) 
/ = i 

in the Viterbi algorithm. 

To improve the algorithm's convergence rate, it is important to select a good initial 
approximation of the decoded sequence. This can be achieved by using one of the suboptimal 
decoding algorithms. For example, we can use the symbol MAP estimate according to equation 
(3), the Viterbi algorithm for the most probable path, ^ 15 ' or an algebraic decoding algorithm. 

To illustrate the algorithm applications, we consider two examples. 

5.1 Map Decoding of Block Codes 

It is well known ^ ,26 ^ that block codes can be interpreted as trellis codes in the following 
way. Let H = [ h x h 2 ,.», h N ] be a parity check matrix of a linear (N,K) code. Define the 
encoder states recursively as 

S 0 = 0, S, = + I t h t . (29) 
Since a codeword is in the null-space of H (I\ H = = 0), the trellis encoder needs to 

keep only the trajectories leading to state 5^ = 0. If the code is systematic, the first K symbols are 
the information symbols /f givea by the source. The parity check symbols /jjf+i are uniquely 
defined by the path leading from state S K to state 5^ = 0. If the source is binary, there are only 
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two possible transitions from the states corresponding to information bits. 

Equation (29) has the form of the first equation of (9). In block-code-based TCM, segments 
of a codeword are mapped into the modulator constellation. If we have a discrete channel, 
the modulator is considered to be a part of the channel and the encoded symbols are transmitted 
directly to the channel. In this case, the second equation of (9) takes the form A", = Thus, both 
equations (9) are satisfied and we can apply the EM decoding algorithm described in this section 
if the channel is modeled by an HMM. 

To be more specific, consider a (5,3) block code with the parity check matrix 



H = 



1 1 0 i 0 
0 110 1 



The code trellis diagram is depicted in Fig. 1. This code has four states: (00), (01), (10), and 
(11). The encoder output bits mark the corresponding state transition edges: the (horizontal) 
transitions between the same states correspond to 0 and transitions between different states 
correspond to 1. For example, for the first observation Y\ we have h x = (1,0) which is the first 
column of the matrix H. 

Suppose that the channel is described by the Gilbert-Elliott model represented by 



equations (6) and (7). According to these equations, we have 



P(0|0) = P(l|l) = 



Pu(\-b x ) pn(\-b 2 ) 

Pi\{l-b\) />22(1~*>2) 



, P(0|1) = P(1|0) = 



P\\b\ P\i b i 
Pi\b x pnb-i 



Let us assume that the sequence Y\ = 01010 was received. As an initial approximation of the 
decoded sequence we choose the closest to the received sequence Y\ codeword A^o = 01011. 
For this codeword, we compute 

ct(r,|0) = K c P c (0|0), a(r?|01) = a(r l |0)P c (l|l), a(tf|010) = a(l1|01)P c (0|0) 
a(y?|0101) = a(r]|010)P c (l|l), a(Zf |01011) = a(yf|0101)P c (0|l). 



This completes the forward part. In the backward part, we compute 



Turin: MAP decoding in channels with memory 



13 



Z,(S 4 =00) = m(X 5 =Q) = Y5,ilog(l-6,) + Y5,2log(l-6 2 ) 
I(5 4 = 01) = m(X 5 = l) = Ys.i log^i +Y5,2log^ 2 
where Y5,/ = a ( -(r? 1 0 101 1)3, and P, r = 1. Then we compute p ( 7 5 1 1 ) = P c (0|l)land 
Z,(S 3 =00) = m(X 4 =0) + m(X 5 =Q) t £(S 3 =01) = m(X 4 = Q) + m(X$ = 1 ) 
I(S 3 = 10) = m(^4 = l) + m(^ 5 =0), Z,(S 3 = 11) = m(X A =l) + m(X s = l). 

In the next step, we need to apply the Viterbi algorithm: Choose Xj = 0 if 

L(S 2 =00) = Z,(S 3 =00) + /n(Xj=0) > I(S 3 =01) + m(X 3 = 1). 

Otherwise, we choose = 1 and L(S 2 =00) = Z(S 3 =01) + m(A r 3 = 1), and so on. At the end of 
this iteration, we find the longest path connecting the nodes at r =0 and r = 5. The corresponding 
encoder output sequence is the next approximation of the decoder output. The process continues 
until two consecutive iterations deliver the same result. 

5.2 Map Decoding of Convolutional Codes 

As a second example, let us consider a convolutional code. The encoder operation can be 
described by the following equations ^ 

2, =#>C.+ /,G ' = l ' 2 '" < 3 °) 

where 5, is the encoder state vector and Z, is the encoder output vector. The encoder initial state 
is usually selected as S \ = 0. 

In the TCM standard implementation, the encoded symbols are transformed by a memoryless 
mapper to produce a symbol in the modulator constellation: 

X t = F(S,) = F(S^C + /,G) - /,(#>,/,). (31) 

As we can see, equations (30) and (31) have the form of equations (9). If we assume that 
symbols X t are transmitted over a channel represented by equation (6), then we can apply the 
EM algorithm described in this section. 

To be more specific, suppose that the source is binary, memoryless, and symmetric 
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(p(0) = p(l) = 0.5). The source bits are coded by the rate 1/2 convolutional encoder shown in 
Fig. 2. 



If we denote the encoder output vector as H, = [^ t{ £ r2 ]> then according to this figure we 



have 



S»i = J] + 0] 

2, = #>[{ + 1] 



(32) 



where the encoder state is defined by the contents of its shift registers which is (see Fig. 2) 
S< 5 > = [/, _ l ,/,_ 2 ] and its state transition probability matrix has the form 



0.5 0 0.5 0 

0.5 0 0.5 0 

0 0.5 0 0.5 

L 0 0.5 0 0.5 



The encoder output symbols are mapped to the QPSK constellation which consists of four 
symbols: X {0) = {-A,-A\ X^ ]) = (-A,A), = (A,-A) 9 and *< 3) = (A, A). Equation (16) 
takes the form 



P(/=o,r)=o.5 



0 
0 



0 0 0 

0 0 0 

.(Y\tf 2} ) 0 0 

(ri^ 0 ) o o 



P(/=i,r)=o.5 



o o p c (r|^ 3) ) o 

o o p c (yj^ 0) ) o 

oo o p c (r|^ l) ) 

oo o p c (rj*< 2) ) 



(33) 



Similarly to the previous example, we can apply the EM algorithm of this section to decoding of 
a terminated convolutional code. If a code sequence is too long, an approximate fixed-lag 
algorithm can be applied in a forward-only fashion. ^ If the channel is described by equation 
(6), we can apply the EM algorithm whose maximization part is performed using the Viterbi 
algorithm with the branch metric (24). 

However, if a bit-level model is described by equation (6), it does not mean that the encoded 
symbol model has this form. In this case we need to apply a more complex metric (21). To be 
more specific, suppose that the channel is described by the Gilbert-Elliott model with parameters 

[3,10] 
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- [is is& »■ - ■»»'■ *• - ° 16 



Thus, according to equation (6), we have 



P c (0) = 



"0.996003 0.002520] o m - T 
.0.033966 0.811440J' au L 



0.000997 0.000480] 
0.000034 0.1 54560 J 



Suppose that the code starts and terminates at the state 00 and has three information bits. The 
encoder trellis diagram along with its output/input symbols are depicted in Fig. 3. For every 
information bit, the encoder produces a two-bit symbol which is presented in decimal form in 
Fig. 3. Thus, we have the following matrix probabilities of errors in the received two-bit 
symbols: 

P(0) = P 2 (0) = [0-992108 0.004555] p(1) = p (Q)? (]) = p 
nuj - r cW [0 061392 0.658520]' riu r e wr e u; |_i 

P(2) = P (DP (0) = F 0-00 1009 0.000392] p(3) = p 2 (1) = p 
r c (i)r e iu) [0.005284 0.125416]' { } c{ } I 



0.000993 0.000868] 
0.000061 0.125432]* 



0.000001 0.000075] 
0.000005 0.023889]" 



These matrices do not satisfy equation (6), therefore, we cannot use the simplified metrics as in 
the previous examples and must use a more complex metric (21). Equation (16) takes the form 



P(/=0,K) = 0.5 



? C (Y) 0 0 0 

P c (^®3) 0 0 0 

0 ? C (Y®2) 0 0 

o p c (y©i) o o 



p(/=i,y)=o.5 



o o p c (re3) o 

0 0 P c (50 0 

0 0 0 ? C (Y®\) 

oo o p c (y©2) 



(46) 



where ® denotes bitwise exclusive-or operation. Suppose that we received Y\ = (0,3,2,1,0). 
As an initial guess for the transmitted codeword we choose X Q = 0,0,0,0,0. Using the 
algorithm described on page 9, we can find a maximum of £?(/f,/f, p ). Obviously, it is 
equivalent to finding a minimum of -2(/f,/f, p ) which can be achieved using the Viterbi 
algorithm by finding the shortest path on the trellis using negative metric in equation (21). We 
obtained after first iteration the following lengths of the shortest paths connecting each node of 
the trellis with its terminal node: 



state 


f = 1 


t = 2 


/ = 3 


f =4 


/ = 5 


0 


0.074- 10" J 


0.0618- 1(T J 


0.0783- 10- J 


0.0443- 10"' 


0.0101- io-' 


1 






0.0784- lO"' 


0.0442- 10"' 


0.0731 10"' 


2 




0.0896- lO"' 


0.051 MO"' 


0.1344- lO - -* 




3 






0.051110- J 


0.0800- 10- J 
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The decoded sequence l\ = 0,1,0,0,0 and the corresponding transmitted sequence 
= 0,3,2,3,0 After second iteration we obtained the following length 



state 


f=l 


t = 2 


f = 3 


f = 4 


t = 5 


0 


0.0248 


0.0229 


0.0344 


0.0177 


0.0020 


1 






0.0341 


0.0171 


0.0289 


2 




0.0382 


0.0204 


0.0567 




3 






0.0204 


0.0319 





and the same decoded sequence as in the previous iteration Y x = 0,1,0,0,0. 

As a test, we performed an exhaustive search which gave the same result with a maximum 
likelihood value of 0.940201. For the HMM fading channel with the AWGN we can use metric 
(28). 

6. CONCLUSION 

We have demonstrated that if an information source, encoder, and communication channel 
are modeled by IOHMMs, then MAP decoding can be realized using the EM algorithm. The 
expectation part of the algorithm is performed using the forward-backward algorithm while the 
maximization part is accomplished using the Viterbi algorithm. The EM algorithm is robust and, 
in contrast with some other iterative algorithms, it converges to the APP maximum in all 
practical cases. ^ Because the set of transmitted symbols is discrete, the number of necessary 
iterations is usually small which was confirmed by a direct simulation. 
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